fix: browse() and list_datablocks() for V3 multi-frame EXPLORE (S7-1200 FW V4.5)#753
fix: browse() and list_datablocks() for V3 multi-frame EXPLORE (S7-1200 FW V4.5)#753tommasofaedo wants to merge 3 commits into
Conversation
tommasofaedo
commented
Jun 19, 2026
…00 FW V4.5) On V3 PLCs (FW >= V4.5) the EXPLORE response for RID 0x8A11FFFF spans multiple TPKT frames and uses a zlib-compressed PlcContentInfo XML format instead of the PObject tree expected by _parse_explore_datablocks(). The existing reassemble=True path does not strip V3 HMAC prefixes from continuation frames, so list_datablocks() returned [] on these PLCs. Changes: connection.py: - Add collect_explore_frames(): collects V3 multi-fragment EXPLORE responses by receiving continuation frames and stripping their HMAC prefix, stopping when a shorter-than-reference frame is detected. _s7commplus_client.py: - Add _build_explore_payload_v3(): VLQ-encoded EXPLORE payload for V3 PLCs (required format for 0x8A11FFFF and per-DB RID explores). - Add _parse_explore_datablocks_xml(): decompresses the zlib PlcContentInfo XML blob and extracts Entity[@id="Block"][@type="DB"] entries; falls back to _parse_explore_datablocks() when no zlib magic is found. - list_datablocks(): when protocol_version >= V3, use _build_explore_payload_v3 + collect_explore_frames + _parse_explore_datablocks_xml. - browse(): when protocol_version >= V3, use V3 payload builder and frame collector for each per-DB EXPLORE. - _parse_explore_fields(): three fixes for V3 PLCs: * Accept WSTRING dtype 0x15 in addition to 0x13 for name attributes. * Auto-detect encoding: UTF-8 (V3, no null bytes) vs UTF-16-BE (V1/V2). * BLOB skip: account for the extra 0x00 byte V3 PLCs insert before VLQ len. * WSTRING skip: advance past string data bytes (was only skipping VLQ). Tested on S7-1200 CPU 1212C DC/DC/DC, firmware V4.5 (V3 protocol, no TLS): - list_datablocks() now returns [{"name": "Data_block_1", "number": 100, "rid": 2316173412}] where it previously returned []. - The PlcContentInfo XML (6131 bytes after decompression) is correctly parsed from a 3-frame response (first 946-byte frame + two continuations). Known limitation: on FW V4.5, DB field definitions and I/Q/M tag names are stored in zlib BLOBs with a Siemens preset dictionary (magic 78 7D, FDICT flag set). Python zlib.decompress() returns Z_NEED_DICT. browse() returns DB names/numbers but cannot enumerate individual field names on V3 PLCs.
gijzelaerr
left a comment
There was a problem hiding this comment.
Review Summary
This PR adds V3 (S7-1200 FW V4.5) support for list_datablocks() and browse() — three distinct fixes for multi-frame collection, zlib-compressed XML parsing, and _parse_explore_fields V3 attribute encoding. Real-hardware tested. No malicious code.
Issues to address:
1. No unit tests. This is the biggest gap. The XML parser, the multi-frame collector, and the _parse_explore_fields fixes all have zero test coverage. At minimum: a test for _parse_explore_datablocks_xml with a synthetic zlib-compressed XML blob, and a test for the WSTRING/BLOB skip fixes.
2. XML entity expansion (XXE). ET.fromstring() uses the default parser which resolves external entities. Since the XML comes from a PLC (not user input), the risk is low, but for defense-in-depth consider defusedxml or at least ET.XMLParser(resolve_entities=False) — a malicious response could trigger entity expansion.
3. collect_explore_frames fragment detection is fragile. The "body shorter than reference by >5 bytes = last fragment" heuristic assumes all full-size frames are within 5 bytes of each other. If the PLC sends a legitimately short intermediate frame, it would be misdetected as the last. A more robust approach: use the V3 protocol's own termination signal (if available) or at least add a max-frame-count guard to prevent infinite loops.
4. collect_explore_frames has no size/count limits. A malformed V3 response could drive unbounded memory allocation. Add caps similar to _recv_reassembled_payload (16 MiB / 4096 fragments).
5. _build_explore_payload_v3 uses VLQ for ExploreId. The existing _build_explore_request was just fixed (in #749) to use fixed UInt32 for ExploreId because that's what real PLCs expect. Using VLQ here may work for V3 but diverges from the corrected convention — is this intentional?
6. BLOB skip offset += 1 for the extra 0x00 byte is V3-specific but runs unconditionally in _parse_explore_fields. If a V1/V2 response has a BLOB attribute, this would skip one byte too many. Guard it behind a V3 check.
7. Async parity. S7CommPlusAsyncClient.list_datablocks() and browse() are not updated — async callers on V3 PLCs still get empty results.
Positive notes:
- Uses
xml.etree.ElementTree(safe, stdlib) - V1/V2 fallback path preserved correctly
- The zlib magic detection (
78 DA) is sound - No legacy
snap7/files touched
Not ready to merge — needs tests, size caps, and the BLOB skip V3 guard.
| xml_bytes = zlib.decompress(response[zlib_pos:]) | ||
| except zlib.error as exc: | ||
| logger.debug(f"_parse_explore_datablocks_xml: zlib error {exc}") | ||
| return [] |
There was a problem hiding this comment.
ET.fromstring() uses the default XML parser which resolves external entities. While the XML comes from a PLC, for defense-in-depth consider at minimum disabling entity resolution. Python 3.8+ ET.fromstring is safe against XXE by default (entities are not expanded), so this is low-risk — but worth a comment noting the assumption.
| # V3 non-TLS: strip the HMAC prefix ([hash_len][hash_bytes]) | ||
| if self._protocol_version >= ProtocolVersion.V3 and len(body) > 33: | ||
| hash_len = body[0] | ||
| body = body[1 + hash_len :] |
There was a problem hiding this comment.
No size or fragment-count cap. A malformed V3 response could loop indefinitely and allocate unbounded memory. Add limits similar to _recv_reassembled_payload (_MAX_REASSEMBLED_FRAGMENTS / _MAX_REASSEMBLED_BYTES).
Also, the "body shorter than reference by >5 bytes" heuristic is fragile — if the PLC sends a legitimately shorter intermediate frame, collection stops early and silently truncates the response.
| break | ||
| count, consumed = _vlq32(response, offset) | ||
| offset += consumed | ||
| offset += count |
There was a problem hiding this comment.
The offset += 1 for the extra 0x00 byte before BLOB VLQ length is V3-specific, but this code runs for all protocol versions. If a V1/V2 EXPLORE response contains a BLOB attribute, this will skip one byte too many and corrupt all subsequent parsing. Guard with a V3 check, or pass the protocol version into this function.
…n to collect_explore_frames - Add fragment count and byte size caps using _MAX_REASSEMBLED_FRAGMENTS / _MAX_REASSEMBLED_BYTES (same limits already used by _recv_reassembled_payload) - Add frag_len == 0 check as primary end-of-stream trailer detection - Keep shorter-than-full-frame heuristic as fallback - Update docstring to document termination and safety limits
… comment, pass protocol_version - _parse_explore_fields: add protocol_version param (default 0 = backward compat) Guard the BLOB extra-0x00 skip with `if protocol_version >= ProtocolVersion.V3` so V1/V2 EXPLORE responses with BLOB attributes are not mis-parsed - browse(): pass self._connection._protocol_version to _parse_explore_fields - _parse_explore_datablocks_xml: add comment noting ET.fromstring XXE safety assumption (safe by default in Python 3.8+, XML source is the PLC)
|
Thank you for the detailed review! I've pushed two commits addressing all three points. 1. Addressed in commit
2. Addressed in commit
3. Addressed in commit Let me know if anything else needs adjustment. |
|
Thanks for the thorough follow-up — all three points look well-addressed:
Two remaining items from the original review that are still open:
|